Reliability and efficiency of algorithms for computing the significance of the Mann–Whitney test

نویسندگان

  • Niranjan Nagarajan
  • Uri Keich
  • N. Nagarajan
  • U. Keich
چکیده

Motivated by recent applications of the Mann–Whitney U test to large data sets we took a critical look at current methods for computing its significance. Surprisingly, we found that the two fastest and most popular tools for exact computation of the test significance, Dinneen and Blakesley’s and Harding’s, can exhibit large numerical errors even in moderately large datasets. In addition, another method proposed by Pagano and Tritchler also suffers from a similar numerical instability and can produce inaccurate results. This motivated our development of a new algorithm, mw-sFFT, for the exact computation of the Mann–Whitney test with no ties. Among the class of exact algorithms that are numerically stable, mw-sFFT has the best complexity: O(m2n) versus O(m2n2) for others, where m and n are the two sample sizes. This asymptotic efficiency is also reflected in the practical runtime of the algorithm. In addition, we also present a rigorous analysis of the propagation of numerical errors in mw-sFFT to derive an error guarantee for the values computed by the algorithm. The reliability and efficiency of mw-sFFT make it a valuable tool in compuational applications and we plan to provide open-source libraries for it in C++ and Matlab.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Designing, validation, and reliability assessment of software to acquire kinematics parameters of motion by image processing

Motion analysis systems are useful and effective equipment in biomechanics research. Unfortunately these systems are available for few researchers because these are expensive equipment. The aim of this study was to design and validation of a practical and inexpensive software, to determine the exact markers position in space and compute the kinematic of movement. In designing the software, the ...

متن کامل

Consumer Nationalism and its relation with Patriotism and World mindedness in Assessment of Domestic and Foreign Sporting Goods

Nationalism in consumption is a form of economic of nationalism that forms consumer beliefs about appropriate or actually morality of goods purchasing and it has been the subject of many investigations. In this study, According to Krejcie & Morgan chart, 278 cases were selected a mong among 965 undergraduate physical education students of Tehran and karaj selected Universities as a researc...

متن کامل

A Novel Method for Impedance Calculation of Distance Relays Using Third Order Interpolation

All algorithms for impedance calculation use an analog-to-digital converter. The high accuracy of the impedance seen by a distance relay is an important factor in the correct isolation of the faulty part of power systems. To achieve this, a novel&#10technique based on third order interpolation is used in this paper. According to this technique, the times and the values of the obtained samples a...

متن کامل

A Novel Method for Impedance Calculation of Distance Relays Using Third Order Interpolation

All algorithms for impedance calculation use an analog-to-digital converter. The high accuracy of the impedance seen by a distance relay is an important factor in the correct isolation of the faulty part of power systems. To achieve this, a novel technique based on third order interpolation is used in this paper. According to this technique, the times and the values of the obtained samples are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008